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ABSTRACT 


The evolution of language has been investigated by several research communities, including biologists and linguists, striv- 
ing to highlight similar linguistic capacities across species. To date, however, no consensus exists on the linguistic capac- 
ities of non-human species. Major controversies remain on the use of linguistic terminology, analysis methods and 
behavioural data collection. The field of ‘animal linguistics’ has emerged to overcome these difficulties and attempt to 
reach uniform methods and terminology. This primer is a tutorial review of ‘animal linguistics’. It describes the linguistic 
concepts of semantics, pragmatics and syntax, and proposes minimal criteria to be fulfilled to claim that a given species 
displays a particular linguistic capacity. Second, it reviews relevant methods successfully applied to the study of commu- 
nication in animals and proposes a list of useful references to detect and overcome mapjor pitfalls commonly observed in 
the collection of animal behaviour data. This primer represents a step towards mutual understanding and fruitful collab- 
orations between linguists and biologists. 
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I. INTRODUCTION 


How language evolved is a long-standing question in science 
(Christiansen & Kirby, 2003). To answer this question, one 
fruitful strategy is to break human language down into vari- 
ous component abilities (Hauser, Chomsky & Fitch, 2002; 
Fitch, 2005). The phylogenetic distribution of each individ- 
ual component can then be investigated, by comparing com- 
municative capacities across species (Hauser et al., 2002). 
This leads to the identification of homologies (traits inherited 
from a common ancestor) or analogies (traits that fulfil a sim- 
iar function, but which have evolved independently). Species 
that are phylogenetically close to us (e.g. non-human pri- 
mates) can therefore be studied to understand the evolution- 
ary history of a human capacity. Studies on phylogenetically 
more distant species (e.g. birds) can help us to understand the 
selective pressures that acted on our ancestors and favoured 
the evolution of human communication as it exists today 
(Fitch, 2015). 

In recent years, significant progress has been made in under- 
standing communicative abilities across a number of species 
(e.g. Searcy, 2019). However, the interpretation and linguistic 
relevance of these capacities remains heavily debated (see, for 
example, Hauser ef al., 2002; Scott-Phillips, 20155; Schlenker 
et al., 20166; Suzuki, Wheatcroft & Griesser, 2018; Bolhuis 
et al., 2018). In some cases, disagreements originate from funda- 
mental differences in the approach, methods and technical 
vocabulary used by researchers mvolved in purely linguistic 
agendas, and those working on the communication of non- 
human animals. This is unsurprismg: the human linguistic 
capacity is an easily observable phenomenon into which we have 
introspective judgments (e.g. whether an utterance is natural, 
and when it can be used; Bolinger, 1968; Marantz, 2005; 
Sprouse, 2013) which can be investigated in a relatively direct 
manner (e.g. we can ask humans about their own practices). Ani- 
mals, on the other hand, possess their own species-specific per- 
ception of the world, cognitive capacities and processes, and 
communication abilities: these cognitive phenomena are only 
accessible to human observers wa measures of behaviour, using 
ethological methods (Olmstead & Kuhlmeier, 2015). As a result, 
field-specific terminology has emerged: the same term can be 
used to describe slightly different concepts in linguistics and biol- 
ogy (for example, the word ‘syntax’). Moreover, the great differ- 
ences between biological and linguistic methodologies often 
make direct comparisons of results extremely challenging. In 
one striking example, Prat (2019) argued that while ethological 
methodologies have thus far had little success in finding ‘lan- 
guage’ in non-human animals, applying the same methodologies 
to humans also fails to find any sign of ‘language’ in human com- 
municative behaviour. The few attempts at direct communica- 
tion or collaborative efforts between biological and linguistic 
fields have sometimes been highly technical (and thus of limited 
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accessibility) or dismissively critical, thus discouraging further 
constructive exchanges. As a result, comparisons between 
human language and animal communication have often been 
considered unfruitful and inefficient, and collaboration bound 
for failure. 

Many of these difficulties can be overcome. This can be 
achieved by increasing exchanges and_ collaborations 
between fields, unifying methods and terminology and 
improving the relevance of comparisons between human 
and animal communicative systems. 

There has been a recent increase in collaborative efforts 
between linguists and researchers studying animal communi- 
cation. These have included the search for computational 
properties of language in other vocal and gestural communi- 
cation systems (e.g. Heesen et al., 2019), and the application 
of formal linguistic approaches (e.g. Schlenker ¢ al, 
20164,c) or computational linguistic approaches 
(e.g. Kershenbaum e¢¢ al., 20144; Leroux et al., 2021) to pri- 
mate vocal communication. These efforts have shown that 
the use of linguistic concepts in comparative research with 
animals is both possible and fruitful. 

However, these bridges are still fragile, and are only crossed 
by a handful of researchers. As a way to further encourage this 
enterprise, we offer here a primer to establish strong basic 
foundations for animal linguistics. In particular, we aim to pro- 
vide linguists with the tools to study animal communication, 
and to provide biologists with basic linguistic notions applica- 
ble to the study of animal communication, using concepts and 
criteria compatible with modern linguistic thinking. This 
primer is the product of a collaboration between researchers 
on animal communication and linguists. It can be read as a 
guide for students and researchers of biology and linguistics 
alike: first, we define the linguistic concepts of semantics, prag- 
matics and syntax, in a way that is both biologically and lin- 
guistically relevant; second, we present data-analysis methods 
that have already been successfully applied to animal systems 
to investigate their linguistic properties. We also include a list 
of introductory readings for linguists interested in working 
with animal communication data. A large number of studies 
in animal linguistics focus on primate vocalizations, which 
consequently represent an important part of the examples pre- 
sented here. However, this guide is intended to be applicable 
to all species and communication modalities; we thus encour- 
age readers to investigate the species of their choice and to 
consider communication capacities outside the vocal domain. 


Il. DEFINITIONS AND CONCEPTS 


Behaviours of humans and non-human animals can be 
explained by a variety of cognitive mechanisms, and similar 
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mechanisms can be seen as convincing evidence of evolution- 
ary continuity between humans and other species. However, 
establishing analogies between human and animal processes 
involves ruling out alternative explanations. One useful crite- 
rion is Morgan’s Canon, which states that behaviours should 
not be interpreted as resulting from higher cognitive faculties 
(e.g. theory of mind) if they can be interpreted as the outcome 
of lower capacities (e.g. associative _ learning) 
(Shettleworth, 2010). This principle prevents researchers 
from assigning human-like (and supposedly cognitively more 
complex) capacities to animals without first rejecting alterna- 
tive hypotheses (‘anthropomorphism’). The field of animal 
linguistics greatly benefits from the systematic application of 
Morgan’s Canon by researchers, but the lack of unity in def- 
initions, misunderstandings of linguistic concepts, and misuse 
of linguistic terminology has led to highly debated claims, 
despite researchers’ efforts to avoid anthropomorphism. 

In this section, we provide precise definitions of the main 
linguistic concepts (semantics, pragmatics, and syntax; see 
Table | for a summary of core concepts), using general prin- 
ciples that can be applied equally well to human and non- 
human communication. For each concept, we provide lists 
of criteria that can be used to evaluate a species’ linguistic 
capacity. 

In the present paper, we restrict our scope to the study of 
signals. Signals are potential sources of information that are 
plastically produced at a cost in response to changes in the 
environment, and are improved over evolutionary time to 
best fulfil their communicative function (Bradbury & 
Vehrencamp, 2011). Signals can take different forms: vocal- 
izations, facial or body movements, chemosensory signals, 
etc. Signals contrast with two other sources of information, 
sometimes called ‘signs’ and ‘cues’. Signs, as defined in the 
biological literature, are also evolutionarily shaped to convey 


Table 1. Summary of definitions and concepts in animal 
linguistics. 


Concept Definition 
Meaning The set of features of circumstances that 
appear at a rate greater than chance across 
the signal’s occurrences. 
Semantic The largest set of meaningful features of 
denotation circumstances that appear across all 
occurrences of the signal. 
Pragmatic The meaningful features of circumstances that 
inference always appear when the signal is emitted in 
the presence (or absence) of a given 
contextual feature. 
Syntax The set of rules that determine what 
sequences are well formed. 
Compositional A system in which the meaning of a syntactic 
syntax structure is derived from the meaning of its 
parts. 
Non- A system in which the meaning of a syntactic 
compositional structure is not derived from the meaning of 
syntax its parts. 
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information, but they do so in a permanent fashion 
(e.g. warning colours) (Hauser, 1996) (note that this is a dis- 
tinct concept from the definition of ‘sign’ used in semiotics 
and philosophy of language, see Pierce, 1931). Cues are often 
generated for purposes other than communication: for exam- 
ple, a footprint in the mud conveys information about the 
presence of a leopard, but it is not evolutionarily optimized 
for communicative purposes. 


(1) Semantics 


Semantics pertains to the meaning of a signal. In its largest 
sense, semantics investigates both the core content of an 
expression (later referred to as ‘semantic denotation’), but 
also the additional inferences that an expression may have 
in different contexts (‘pragmatics’) (Chierchia & 
McConnell-Ginet, 1996). For example, upon hearing the 
sentence ‘It is raining’, one understands that there is water 
falling from the sky, but one may make further inferences 
depending on the context of emission: for example, the 
speaker may signal that the laundry should be brought inside, 
or that their interlocutor should take an umbrella. 

A common distinction is the opposition between signals 
that are symbolic parts of a code versus signals that merely cor- 
relate with states of affairs [see Grice (1957) on ‘non-natural’ 
and ‘natural’ meaning]. An intuitively clear example is the 
difference between the utterance ‘I’m happy’, and the act 
of smiling. These two signals appear in similar contexts — 
namely, when the signaller is happy. Nevertheless, there 1s 
an intuitive difference between them: the utterance is a sym- 
bolic code that ‘stands for’ a particular state — it is intention- 
ally uttered to convey a message — while the smile is not. 
According to some (e.g. Scott-Phillips, 20152), only symbolic 
utterances can be considered meaningful and deserve to be 
linguistically investigated. But non-symbolic signals can be 
interpreted by others: a person may react or reply to a smile 
in a similar way to the uttered sentence. These signals can 
also be audience-aware: a person may choose to cover or 
repress their smile if another individual is present. It is still 
unclear to what degree animal signals can be considered sym- 
bolic. For example, some non-human primates have been 
shown to adapt their gestural behaviour to the attentional 
state of the audience (Maille et al., 2012) while the alarm sig- 
nal of crested pigeons (Ocyphaps lophotes) is produced by the 
physical properties of the feathers; the sound occurs when 
the pigeons flap faster to escape predators, irrespective of 
the audience (Murray, Zeil & Magrath, 2017). On the whole, 
while it may be possible to precisely characterize related con- 
cepts like audience-sensitivity, it is not clear that there is any 
well-defined way to characterize ‘symbolic’ meaning, espe- 
cially when one moves to the domain of animal communica- 
tion. One solution to this problem that has been used with 
good results is thus to use a relatively broad characterization 
of meaning (Schlenker et a/., 20160) that is not restricted to 
symbolic signals, and to describe the properties of these sig- 
nals on a case by case basis. 
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In animal communication, the typical pattern of signal 
emission is presented in Fig. 1. An individual with its specific 
characteristics witnesses a noteworthy event in its surround- 
ings. This event elicits a temporary emotional or physiologi- 
cal state in this individual. The individual emits a signal, 
which is perceived by a receiver, who reacts by exhibiting a 
behaviour. During or after the signal emission, the signaller 
also displays a behaviour. Both the signaller’s and receiver’s 
behaviours induce changes in the original event: the situation 
at the end of the first signal-loop is not exactly the same as 
before (e.g. all group members are now further from a pred- 
ator, or a receptive female is closer to the signaller) — it is now 
a new event that can trigger a new chain reaction and elicit 
the emission of a new signal. 

The emission of each signal is thus associated with a set of 
circumstances: the external event, the permanent traits of the 
signaller, the transient state of the signaller, the receiver’s 
behavioural reaction, and the signaller’s behaviour. Each of 
these circumstances is characterized by a specific set of fea- 
tures, which may vary between signal occurrences: e.g. the 
type, size, shape or distance of the external object, the sex 
or identity of the signaller, the valence or arousal of the emo- 
tional state of the signaller, the type or strength of the signal- 
ler’s or recipient’s behaviour. For animal communication, we 
define the meaning of a signal as the set of features of 


Receiver 


Behaviour R | 


| Behaviour S | 


= o — 


- 


Signaller 
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circumstances that appear at a rate greater than chance 
across the signal’s occurrences (adapted from Dezecache & 
Berthet, 2018). This allows one to assign meaning to a signal 
based on (2) the features of circumstances in which it is used, 
in comparison to (2) the features of circumstances in which 
it is not. For example, if there is an overall 2% chance that 
a leopard is present at any given moment, but when a given 
signal is uttered this chance goes up to 90%, then the feature 
‘presence of a leopard’ is part of the meaning of the signal. 

It must be noted that this definition applies primarily to 
propositional meaning (meaning similar to that of complete 
sentences in human language, like ‘I am happy’). While this 
provides a necessary first step into a basic understanding of 
animal communication, further types of meaning may be 
necessary to analyse signal combinations, for which it will 
ultimately become crucial to analyse the meaning of the indi- 
vidual parts of the utterances (e.g. the meaning of ‘I’, ‘am’ 
and ‘happy’). To this end, we will return to the meaning of 
the component parts in Section II.2.c, once we have intro- 
duced the notions of syntax and compositionality. 

Finally, a common mistake is to confuse meaning and 
function of a signal. The meaning of a signal belongs to the 
proximate level of explanation, relating to the situations or 
behaviours that directly trigger a communicative signal. In 
contrast, the ultimate explanation of a signal relates to the 


ae mu 
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Permanent trait 


| Transient state 


Fig. 1. Semantics summary. A noteworthy external event is witnessed by an individual with specific permanent traits (1.e. long-term 
characteristics). This event elicits transient states (i.e. temporary emotional and/or physiological reactions), elicits the emission of a 
signal (here represented by a waveform), and elicits a behavioural response (Behaviour 5S’) by the signaller. The receiver performs a 
behaviour (Behaviour R) in response to the signal. The emission of the signal is thus associated with a set of circumstances 
(represented by the boxes). The semantics of the signal is the set of features of circumstances that appear at a rate greater than 


chance across the signal’s occurrences. 
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adaptive function that it serves on an evolutionary scale. For 
example, the (proximate) meaning of an alarm signal might 
be ‘there is a predator’ but its (ultimate) evolutionary func- 
tion is to attract attention of the social partners, due to its 
sharp acoustic parameters. A contact call can mean ‘T have 
pacific intentions’ but conveys precise information about 
the identity and spatial position of the caller to increase 
inter-individual recognition and facilitate group cohesion. 
Similarly, some signals, like some birds’ songs, do not appear 
to be meaningful, while their function is to display the read- 
iness of the emitter to defend its territory (Berwick 
et al., 2011). 


(a) Semantic denotation: the core meaning 


In the semantics of human language, the “denotation’ of a 
word or utterance is commonly defined as its stable semantic 
contribution (Frege, 1892, 1952; Grice, 1957; Katz & 
Fodor, 1963). Here, ‘stable’ indicates that, while a given 
utterance may be used in a variety of different contexts for 
a variety of different purposes, the core meaning of the utter- 
ance — what it denotes — is that part of meaning that always 
stays the same, i.e. what is common across all its many uses. 
For example, although we have seen that ‘It’s raining’ may 
be used to communicate different things (e.g. that the laundry 
should be brought inside), the core meaning of the sentence is 
just that there is rain. 

When applied to animal communication, the semantic 
denotation of a signal is the largest set of meaningful features 
of circumstances that appear across all occurrences of the sig- 
nal. For example, if a signal is produced 70% of the time in 
response to leopards and the other 30% of the time in 
response to eagles, the denotation of the signal is the set of 
features common to all occurrences: presence of a predator. 

Signals can denote specific features of transient states in 
the signaller, such as the emotional valence or the type and 
intensity of physiological states (fear, hunger, sexual recep- 
tiveness ¢fc.). This seems to be the case for the alarm calls of 
vervet monkeys (Chlorocebus pygerythrus), which are produced 
in response to specific classes of predators but also during 
aggression events, probably because different situations elicit 
similar emotional states in the callers (Price et al., 2015; but 
see Schamberg, Wittig & Crockford, 2018). 

Signals can also denote features of the receiver’s beha- 
vioural responses (e.g. activity, latency to react, direction or 
distance of movement, edc.). These signals aim at eliciting a 
specific response in the receivers, 1.e. they are goal-directed 
signals (Schamberg ¢ét al., 2018). For example, in a group of 
wild chimpanzees (Pan troglodytes), all usage of the ‘present 
climb on’ gesture results in the receiver climbing on the sig- 
naller (Hobaiter & Byrne, 2014). Arguably, another example 
comes from the alarm calls of putty-nosed monkeys (Cerco- 
pithecus nictitans): females emit ‘chirps’ to recruit males into 
predator-deterrence behaviour, and stop calling when the 
male spots the predator and starts mobbing it (Mehon & 
Stephan, 2021). 


85 


Signals can denote features of the signaller’s behaviour 
(e.g. activity, latency to react, direction or distance of move- 
ment, eéc.), exhibited during or after the emission of the sig- 
nal. This is the case for signals that are an indication of 
one’s intentions (Cheney & Seyfarth, 2018). In hierarchical 
social groups, interactions can be ambiguous, and signals 
can be emitted to reduce uncertainty about the behaviour 
or intentions of the signaller. In chacma baboons (Papio ursi- 
nus), females emit grunts when approaching other females, 
which conveys information about the pacific intentions of 
females (Silk, Seyfarth & Cheney, 2016). In putty-nosed 
monkeys, males emit ‘pyows’ in response to females’ alarm 
calls, while approaching the rest of the group but before spot- 
ting the predator: they advertise their engagement to defend 
the group against a predator (Mchon & Stephan, 2021). 

A signal can also denote features of the external event, 
which comprises the presence of an object (e.g. a predator, 
specific food, a receptive female), a social interaction 
(e.g. Inter-group aggression), or a situation (e.g. proximity 
to the territorial border). An example is the alarm ‘hoos’ of 
chimpanzees, which denote ambush threats: the presence of 
snakes elicits the emission of alarm ‘hoos’, but emission can 
be suppressed when receivers are already informed, suggest- 
ing that the permanent traits and transient states of the caller, 
the behaviour of the recipient or the behaviour of the caller 
are not denoted by these calls (Crockford e¢ al., 2012; Crock- 
ford, Wittig & Zuberbiihler, 2017; see also Girard-Buttoz 
et al., 2020). 

Finally, signals can denote features of the permanent traits 
of the signaller, i.e. characteristics of the signaller that remain 
unchanged over long periods of time, such as identity, sex or 
age class. A convincing case is that of bottlenose dolphins 
(Turstops truncatus), which use whistles that are individually dis- 
tinctive and thus, convey individual identity. Importantly, 
these dolphins can use each other’s whistles to address other 
individuals, and recognize the signature whistle of conspe- 
cifics artificially modified to remove voice characteristics 
(Janik, 2000; Janik, Sayigh & Wells, 2006). We discuss the 
special issues involved in signalling permanent traits in 
Section II. 1.c. 

The semantic denotation of signals can vary between spe- 
cies: we do not expect all non-human species to adopt one 
unique pattern, since different species face different ecologi- 
cal and social pressures and need signals to communicate 
about a large variety of things. Moreover, we do not expect 
one species to adopt the same strategy for all the signals of 
its repertoire: different signals given by a single species may 
have different semantic denotations because they fulfil differ- 
ent functions (e.g. the social call A of species X can denote 
features of signaller’s intentions while its alarm call B can 
denote features of external events). 

Of course, it is sometimes difficult to disambiguate the 
semantic denotations of signals as cognitive mechanisms 
involved in the signalling remain poorly understood, and fea- 
tures of circumstances are often closely correlated. For exam- 
ple, while the emission of signal X may be strongly correlated 
with the presence of a predator Y, it is difficult to firmly 
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establish that X denotes Y (e.g. ‘there is a leopard’), and not 
the emotional state associated with Y (‘Iam scared’), the invi- 
tation to produce the adaptive response to Y (‘run away’) or 
the intention of producing this adaptive response (‘T will run 
away ). In some cases, the precise semantic denotation of a 
signal can be deduced by careful observations and experi- 
mental tests of all the situations and behaviours associated 
with the emission of the signal, conducted on different popu- 
lations living in different socio-ecological conditions 
(e.g. Schlenker ef al., 2014). This approach can be comple- 
mented with direct (e.g. Liao et al, 2018; Mocha & 
Burkart, 2021) and = indirect (e.g. Schehka & 
Zimmermann, 2009) investigations of the signaller’s transient 
state. An alternative approach to clarify the exact semantic 
denotation ofa signal is to explore the mental representations 
it triggers in receivers (if any). This approach has been suc- 
cessfully implemented in primates (Zuberbiihler, Cheney & 
Seyfarth, 1999) and birds (Suzuki, 2018) using experimental 
protocols. Finally, comparing the semantic denotation of dis- 
tinct signals in the repertoire can disambiguate the meaning 
of a given signal (Schlenker ¢ al., 20160; see Section IIL.5 
for more details). However, even after extensive research, it 
may remain impossible to decide between several hypothe- 
ses, because of human or technical limitations. 


(6) Pragmatics: the contextual meaning 


In Section II.1, the semantic denotation of a signal was 
defined as its stable semantic contribution — that part of its 
meaning that stays the same across all signal occurrences. 
In contrast, pragmatics pertains to those aspects of meaning 
that are not stable and depend on context. A child who says 
‘I need to pee’ to their parents while riding in a car is asking 
them to find a place to stop, while a teenager who utters the 
same sentence to a sibling taking a long shower is communi- 
cating the message “Get out of the bathroom!’. The semantic 
denotation of the sentence does not change, but different 
inferences are made depending on the context in which it is 
uttered. 

For pragmatic inferences, as for the semantic denotation, 
meaning is defined with respect to the features of a circum- 
stance that appear at a rate greater than chance, comparing 
situations in which the signal is used to situations in which it is 
not. However, pragmatic inferences depend on contextual 
features. Pragmatic inferences are the meaningful features 
of circumstances that always appear when the signal is emit- 
ted in the presence (or absence) of a given contextual feature. 
Pragmatic inferences are not part of the semantic denotation: 
they are elicited by variations of contextual features, and 
enrich the meaning of the signal beyond its semantic denota- 
tion (see Fig. 2). 

In the example above, the signal ‘I need to pee’, when 
uttered in the car, is meaningfully associated with a feature 
of the receiver’s behavioural response — the parent is signifi- 
cantly more likely to stop the car than if the child had said 
nothing. On the other hand, this behavioural response disap- 
pears if the child is talking to their sibling in the shower. The 
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receiver’s behaviour ‘stop the car’ is thus a pragmatic infer- 
ence that is elicited by the contextual feature ‘where is the 
sentence uttered?’, but it is not part of the semantic 
denotation. 

Pragmatic inferences involve reasoning both for the signal- 
ler and the receiver. For the receiver, the question is: “why 
was this particular signal used and not another, and why 
was it uttered now?’ For the signaller, it is the opposite ques- 
tion: ‘which signal should I use, and when?’ Because prag- 
matics involves reasoning about communicative acts, it 1s 
often taken to interact with theory of mind — that is, the abil- 
ity to entertain theories about why others do what they do, 
for instance by attributing to them some mental state 
(Premack & Woodruff, 1978). This ability is implicitly pre- 
sent in one way of describing pragmatics: ‘what is the speaker 
trying to say?’ However, as highlighted by Schlenker 
et al. (20166), pragmatic reasoning does not have to require 
a theory of mind. Instead, it can rely on simple associative 
learning. For example, if an office worker hears people run- 
ning down the hallway, there are a number of possible expla- 
nations: there could be a fire or there could be free pizza. The 
office worker nevertheless is likely to conclude that there is 
not a fire, due to the absence of a more specific signal: the fire 
alarm has not gone off. The office worker is thus reasoning 
about the state of the world, based on the absence of a signal 
(fire alarm) which is normally associated with a circum- 
stance’s feature (fire); this certainly does not require the rea- 
soner to have a theory of mind of fire alarms. Pragmatic 
principles relying on strong associations between signals 
and circumstances’ features have been applied to the call sys- 
tems of several primates, without any assumption about their 
theory of mind. For example, Schlenker et al. (20164) propose 
that, in male Campbell’s monkeys (Cercopithecus campbelli), 
‘krak’ calls denote all kinds of predators (aerial or terrestrial), 
and ‘hok’ calls denote aerial predators. In many cases, 
though, recipients infer that a terrestrial predator is present 
when hearing a ‘krak’, because if an aerial predator was pre- 
sent, a ‘hok’ would have been emitted. This reasoning is sim- 
ply based on the animals’ knowledge of their signal repertoire 
and their circumstances of use, and does not presuppose any 
theory of mind. This said, accumulating evidence suggests 
that some non-human animals (e.g. great apes) do possess a 
theory of mind (e.g. Krupenye ée¢ al, 2016; Kano 
et al., 2019), which they could use for pragmatic reasoning. 

Non-human pragmatics should therefore be investigated 
through one guiding question: what kinds of information 
can be incorporated and reasoned about in the communica- 
tion of different species? Some kinds of information may 
depend on low-level cognitive functions, such as basic per- 
ception; other kinds of information may constitute higher 
level cognitive functions, such as representing the knowledge 
or intentions of others [see Scott-Phillips (2017) on ‘weak 
pragmatics’ versus ‘strong pragmatics’ ]. 

First, pragmatic inferences can be elicited by the presence 
(or absence) of directly observable contextual parameters. 
These directly observable contextual parameters can be the 
external events. For example, in a playback study, female 
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Fig. 2. (A) Schematic representation of the semantic denotation. Each circle represents a situation in which a given signal was emitted 
(left) or not (right). Each of these situations is characterised by circumstances that have specific features (letters). The semantic 
denotation of the signal is the largest set of meaningful features of circumstances that appear across all occurrences of the signal: 
here, only B fulfils this criterion. A is always present, whether or not the signal is emitted: it is not associated with the signal above 
chance, so it is not a meaningful feature. C, D, E and F are not always present when the signal is emitted, so they are not part of 
the semantic denotation of the signal. (B) Schematic representation of a pragmatic inference. What pragmatic inferences are 
elicited by the contextual parameter C? To answer this, we only look at situations in which C is present (blue circles). The 
inferences of the signal in the context of C are the meaningful features of circumstances that always appear when C is present. 
Here, only D fulfils this criterion. B is part of the semantic denotation. A is not associated with the signal above chance, so it is not 
meaningful. E is not always present when the signal is emitted while C is present, so it is not a pragmatic inference of the signal in 
the context of C. An example of these features in humans, with the signal ‘I need to pee’: A, the speaker is in good health; B, the 
speaker’s bladder is full; C, the speaker is in a moving car; D, the car stops; E, the speaker is wearing a red T-shirt; F, the speaker’s 
sibling offers a drink to the speaker. An example of these features in putty-nosed monkeys, with the signal ‘hack’: A, the signaller is 
in a tree; B, there is a general alert; C, a tree falls; D, the recipient does not look upwards; E, there is a feeding tree nearby; F, the 
recipient grooms the signaller. 


A 


putty-nosed monkeys were shown to react differently to male elicited by variations of the signaller’s behaviour, like its gaze 


calls depending on observable properties of the environment, 
like the presence of noise of a falling tree or acoustic cues of a 
predator’s presence [Arnold & Zuberbiihler (2013); see also 
Arnold & Bar-On (2020) for a discussion]. The semantic 
denotation of the signal itself remained the same (it is a gen- 
eral alert), but the recipient enriched the meaning of the sig- 
nal (i.e. it modified its behaviour) based on observable 
features of the circumstances co-occurring with emission of 
the signal. Similarly, Diana monkeys (Cercopithecus diana) react 
to conspecifics’ alarm calls differently depending on the emis- 
sion of prior calls or the presence of environmental cues 
(e.g. previous signs of the presence of a_ predator) 
(Zuberbiihler e al., 1999). Pragmatic inferences can also be 


direction: when an event (e.g. the presence of a predator) 
elicits the emission of a signal that does not denote the event’s 
location, receivers can retrieve location information from the 
signaller’s behaviour (e.g. infer that the predator is in the can- 
opy if the signaller is looking upwards) (Davidson e¢ al., 2014). 

Second, pragmatic inferences can be elicited by the pres- 
ence (or absence) of contextual parameters that are not 
directly observable by the signaller or the receiver. These 
include, for example, the representation of the group’s social 
structure and the memory of past social interactions 
(e.g. Bergman & Sheehan, 2013; Wittig et al., 2014). For 
example, chimpanzees react differently to aggressive barks 
of conspecifics that are closely bonded to a subject’s former 
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opponent: individuals can thus enrich a signal’s semantic 
denotation (e.g. an aggressive interaction) with social knowl- 
edge (e.g. friendship and social structure of the community) 
and past personal history (e.g. having been subject to aggres- 
sion from a conspecific) (Wittig ef al., 2014). Another example 
comes from the representation of third-party relationships: in 
chimpanzees, victims of severe attacks produce screams 
whose acoustic structure exaggerates the level of aggression 
experienced if the audience includes at least one listener 
whose rank matches or surpasses that of the aggressor 
(Slocombe & Zuberbuhler, 2007). Finally, these can include 
representations of the knowledge or belief states of others: 
for example, wild chimpanzees modulate alarm calling and 
other communicative behaviour as a function of conspecifics’ 
knowledge (Crockford e¢ al., 2012). 

We note that it may sometimes be difficult to disentangle 
pragmatic inferences from semantic denotations, especially 
when it is difficult to decide whether two different forms are 
occurrences of the same signal (see for example Kuhn 
et al., 2018). For example, black-fronted titi monkeys (Callice- 
bus nigrifrons) possess two acoustic variants of the alarm B-call: 
one higher-pitched variant is given in response to terrestrial 
predators, and a lower-pitched one when the caller is des- 
cending to the ground (Berthet et al., 2018). One hypothesis 
is that B-calls are two different calls, with different semantic 
denotations: the lower call means ‘I am going to the ground’ 
and the higher call means ‘there is a terrestrial predator’. An 
alternative hypothesis is that B-calls have one semantic deno- 
tation, regardless of their acoustic structure (e.g. ‘I am 
afraid’), but that contextual parameters act on their acoustic 
structure and slightly modify their meaning (e.g. ‘Iam a little 
afraid’, when the caller is going near the ground, versus ‘Tam 
very afraid’, when a terrestrial predator is present). This 
question can possibly be answered by investigating whether 
listeners consider the two variants as graded variations of 
one signal, or as two different signals (see Section III. 1). 


(0c) Communicating information about a signaller’s permanent traits 


It is common for a signaller’s characteristics (e.g. identity, 
size, sex) to influence the shape of the signal (e.g. the body size 
of the caller influences the fundamental frequency of its calls) 
and transmit reliable information about the signaller to 
receivers (e.g. Ey, Pfefferle & Fischer, 2007; Bowling 
et al., 2017). It may be attractive to consider this information 
as part of the semantic denotation of the signal, similarly to 
individual names or age labels in human language. The pre- 
sent approach gives a slightly different perspective on such 
cases. 

First, on the approach here, the semantic denotation of a 
signal is defined as the largest set of meaningful features of 
circumstances that appear across all occurrences of the sig- 
nal. Notably, though, permanent features of the signaller 
are always present, even when the signal is not emitted; they 
thus do not appear at a rate greater than chance, so cannot 
be part of the semantic denotation of a signal. On the other 
hand, signals with a tautological denotation (i.e. signals that 
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are always true) may have the function of conveying the iden- 
tity and location of an individual: here, the choice to use the 
signal may itself generate pragmatic inferences. In English, 
for example, the sentence ‘I’m here’ is true no matter who 
or where the speaker is; based on the present framework 
the sentence has only a trivial semantic denotation. A speaker 
may nevertheless decide to use the utterance (instead of 
remaining silent) to elicit a pragmatic inference in the 
receiver (e.g. the receiver approaches the voice source). Sim- 
ilar reasoning may apply to contact or territorial songs of ani- 
mals, which are likely not meaningful but have an attractive 
or defensive function. This is, for instance, likely to be the 
case for giant otters (Pteronura brasiliensis) whose contact calls 
function to maintain socio-spatial cohesion and reliably con- 
vey caller identity (Mumm, Urrutia & Knérnschild, 2014). 
Receivers can detect the caller’s location and identity, from 
which they draw pragmatic inferences about appropriate 
behaviour (e.g. to approach the caller or not). 

Second, semantic investigations are conducted across all 
occurrences of a signal. This implies that the object of study 
is an idealized, stable shape of the signal that is not impacted 
by the conditions of production: the semantic denotation of 
the signal remains the same across all occurrences, regardless 
of the signaller’s traits. A point of contrast can thus be drawn 
between identity information drawn indirectly from a signal 
(e.g. David Attenborough saying ‘It’s me’ with his distinc- 
tively recognizable voice) versus identity information that 1s 
part of the denotation itself (e.g. David Attenborough saying, 
‘It’s David Attenborough’). If identity information is part of 
the denotation itself, the meaning should remain even when 
the signal is emitted by another individual (e.g. anyone else 
saying ‘It’s David Attenborough’ to refer to the famous 
biologist). 

As a consequence, few animal systems qualify so far as 
semantic denotations of permanent traits. Earlier, we men- 
tioned the case of bottlenose dolphin whistles: since these sig- 
nals convey stable information about permanent traits even 
when the vocal characteristics of the signaller have been 
removed (Janik eé al., 2006), they can be said to semantically 
denote identity. 


(d) The case of deception 


In the definitions above, we have assumed that all animal sig- 
nals are produced truthfully. This simplification is made out 
of necessity: since direct introspective methods are not possi- 
ble for non-human animals, meaning must be defined (at a 
first pass) wa features of the real world. In reality, though, 
an emitted signal may be false (i.e. the signal is not emitted 
in the set of circumstances with which is normally correlated), 
either accidentally or as an attempt to deceive. 

Deception occurs when an individual produces a signal 
whose reception will benefit it at the expense of the receiver. 
Deception has been observed in a wide range of species. For 
example, fork-tailed drongos (Dicrurus adsimilis) produce 
alarm calls to threats that can be understood by sympatric 
species, but they also use the same alarm calls in non-threat 
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contexts to scare away these animals and steal their food 
(Flower, Gribble & Ridley, 2014). Mantis shrimp (Gonodacty- 
lus bredint) produce meral spread threat displays to drive off 
conspecific opponents, even when they are newly moulted 
and thus vulnerable to attack (Adams & Caldwell, 1990). 
Tufted capuchins (Sapajus apella) sometimes produce alarm 
calls during feeding events, which elicits an escape reaction 
from conspecifics and allows the caller to access food (Kean 
et al., 2017). Notably, such examples (as well as deception in 
human language) provide a challenge for a framework like 
the one we have presented above, in which the meaning of 
a signal is defined relative to the observed circumstances of 
its use, since in cases of deception these circumstances might 
not be found. However, we believe that these special cases 
are not a major limitation. 

First, for deceptive communication to be effective, a signal 
can only rarely be emitted in the wrong context. High rates of 
unreliable signalling may put selective pressure on receivers 
to learn that the sender is not trustworthy and ignore their 
signals (Wheeler & Hammerschmidt, 2013). It is thus likely 
that, if a species displays deceptive communication, these 
cases will nonetheless remain rare, with relatively little 
impact on the statistical evaluation of meaning. 

Second, deceptive communication can still provide insight 
into the meaning of a signal, by using the relationship 
between the semantic denotation and the pragmatic infer- 
ences. Namely, when one lies, one nevertheless expects the 
recipient to behave as though one were telling the truth 
(e.g. utterimg ‘I need to pee’, in order to escape a boring 
class). On the present framework, the receiver’s behaviour 
is generally a pragmatic inference (provided the signal does 
not denote the receiver’s behavioural response). Because a 
receiver has no way of knowing whether a signal is truthful 
or not, this particular inference — the receiver behavioural 
response — will remain constant across signal emissions, even 
if the semantic denotation is not true. The stability of certain 
pragmatic inferences in a given context thus provides an ave- 
nue to hypothesize about the meaning of a signal. One sees 
that recipients react as though X were the case, even if it 
actually is not (this logic underlies the playback methodology, 
in which one observes reactions of an animal to a false signal). 
But notably, there is no silver bullet for working backwards 
from the pragmatics to the denotation: this requires theories 
of pragmatics (see Section HI.5) and theories of animal 
behaviour, and likely varies from species to species. 


(2) Syntax 


Syntax describes the set of rules that determine what 
sequences are well-formed, and what sequences are not. It 
is a combinatorial system: that is, it combines and orders 
units into sequences. 

In its most general sense, syntax does not require a seman- 
tics; that is, neither the units beimg combined nor the result- 
ing sequence necessarily have to be meaningful. Consider 
sequences of parentheses that must be first opened and later 
closed: the sequences ()() and (()) are well formed, but the 
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sequences (() and ))(( are not. In animal communication, the 
presence of syntax without a semantic interpretation has 
been suggested for birdsong: it is possible to describe a set 
of rules (i.e. a syntax) that describes which sequences of notes 
are well formed and which are not (Berwick et al., 2011), but 
neither the individual notes nor the resulting sequences bear 
distinct meanings on the definitions above. 

However, many syntactic systems do interface with seman- 
tics. For humans, it has been observed that natural language 
involves two distinct combinatorial systems, acting at two dif- 
ferent levels (Marler, 1977; Pullum & Zwicky, 1988; Collier 
et al., 2014). At a first level, phonology combines articulatory 
units, the phonemes (sounds in spoken language or body 
movements in sign languages) into words. For instance, 
English phonology determines that ‘plimp’ is a well-formed 
sequence (even if it is not a real word), but ‘Ipipm’ is not. 
At a second level, (sentential) syntax combines words into 
sentences. The rules of English syntax, for example, deter- 
mine that ‘The bird is singing’ is a well-formed sequence, 
but that ‘Singing the is bird’ is not. (These levels can be fur- 
ther refined to additionally include morphology, which com- 
bines roots and affixes into words, such as sing+ing.) Notably, 
for human language, sentential syntax interfaces with seman- 
tics: words bear meaning, and the meaning of a sentence is 
derived from the way that these meanings are combined via 
the syntax (e.g. “Alex ate the chicken’ and “The chicken ate 
Alex’ involve the same units, but receive rather different 
interpretations). 

The term ‘syntax’ is ambiguously used in the literature: it 
sometimes refers to combinatorial systems in general (includ- 
ing birdsong and human phonology), or to the specific system 
that combines words into sentences in human language — to 
disambiguate, we will call the latter ‘sentential syntax’. 

The fact that clear formal properties distinguish these two 
levels of combination in human language allows further dis- 
tinctions to be made. For example, idioms are expressions 
that are built wa the sentential syntax, but whose meaning 
is not derived from the meaning of their parts. For example, 
the idiomatic meaning of ‘spill the beans’ is unrelated to the 
meaning of ‘beans’, but it is nevertheless an output of the syn- 
tactic system and not the phonological system (i.e. it is not a 
single word ‘spilthabeens’), due to its interaction with the 
sentential syntax in other ways (such as tense: ‘spilled the 
beans’). Some specific cases complexify the classification of 
combinations. For example, some sequences that are gener- 
ated by the phonological system may nevertheless seem to 
contain the units of sentential syntax or idioms: for example, 
the word ‘candid’ can be decomposed into the sounds ‘can’ 
and ‘did’, which are themselves both words, but this is 
completely accidental. Another difficult case is due to lan- 
guage evolution. For example, the historical etymology of 
‘daisy’ is the idiomatic ‘day’s eye’, derived from sentential 
syntax, but this has been reanalysed as phonological structure 
over time. 

These examples illustrate that the distinction between 
phonology, sentential syntax and idioms in human language 
relies on a clear delineation of their properties, which are well 
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known and understood. For animal communication, such a 
distinction may be premature, for our lack of understanding 
of these systems prevents us from drawing a clear delineation 
between these concepts. As a result, the distinction that 1s 
commonly and productively made for animal communica- 
tion is whether or not a meaningful combination is semanti- 
cally compositional: that is, whether or not the meaning of 
the whole is derived from the meaning of the parts 
(Frege, 1892, 1952). Both compositional and non- 
compositional combinations have functional and ecological 
value. Semantically non-compositional combination allows 
a large and arbitrary vocabulary to be generated by a small 
set of units (Fitch, 2019). Semantically compositional combi- 
nation allows simple meanings to combine to produce more 
complex concepts without needing to memorize each con- 
cept individually (Collier e al., 2014). We might thus expect 
to observe both kinds of combination in animal communica- 
tion systems. 

We propose a two-step procedure to investigate syntactic 
properties of non-human systems, summarized in Fig. 3. 
First, we propose two criteria to detect syntactic structures. 
Second, we present methods to investigate the interface 
between the syntactic structure and semantics, to qualify 
the degree of compositionality of the syntactic structure. 


(@ Step 1: detecting syntactic structures 


Syntax describes the combination of units into sequences. 
The first step of the detection of a syntactic structure consists 
of identifying individual units and the sequences in which 
they can appear, and verifying that signals that appear in 
two different sequences are perceived as the same unit (stage 
1 in Fig. 3). In other words, is X; in the sequence X,A the 
same unit as X9 in the sequence X)B? One frequent method- 
ology is to test animals’ reactions to artificially created 
sequences, generated by replacing signals from one sequence 
with signals from a different sequence: if X; and Xp» are per- 
ceived as the same units, the recipients should react similarly 
to X,A and XA, and to X9B and X,B. For example, male 
Campbell’s monkeys produce ‘krak’ alarm calls as well as 
‘krakoo’ calls, which appear to be the combination of the 
‘krak’ call with an ‘-oo’ ending (Ouattara, Lemasson & 
Zuberbiihler, 20092). To test whether the “krak’ in both cases 
is perceived as the same unit, Coye et al. (2015) generated 
artificial ‘krak’ and ‘krakoo’ calls by either adding an ‘-oo’ 
to a ‘krak’ call or by removing an ‘-oo’ from a ‘krakoo’ call. 
The authors found that Diana monkeys, which associate with 
Campbell’s monkeys, responded similarly to both the natural 
and the artificial calls, showing that the two ‘krak’s are per- 
ceptually the same. Methods for establishing a signal reper- 
toire are discussed further in Section I1.1. 

Once combinatorial units are identified, the next stage 1s 
to investigate the rules of combination (stage 2 in Fig. 3). A 
typical methodology consists of drawing hypotheses about 
the rules of combination from a large combination data set 
then testing their validity by creating artificial combinations 
that disrupt the hypothesized combination rules: if recipients 
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react differently to disrupted combinations and original com- 
binations, then the rules of combinations matter. One com- 
mon combination rule is order. In human language, for 
example, ‘the man’ is a well-formed noun phrase while 
‘man the’ is not; ‘blue sky’ and ‘sky blue’ are both well 
formed but do not mean the same thing. Many animal com- 
munication systems show a similar importance of order in the 
syntactic system. The Japanese great tit (Parus minor) com- 
bines alert calls and recruitment calls into an alert— 
recruitment sequence structured according to an ordering 
rule: if the alert and recruitment calls are reversed (Le. a 
recruitment—alert sequence), then receivers do not react 
(Suzuki, Wheatcroft & Griesser, 2016). On the other hand, 
in other animal communication systems, order may be less 
important (Engesser & Townsend, 2019). For example, 
alarm sequences of titi monkeys are structured according to 
rules of proportions of consecutive call types (Berthet 
et al., 20196), while those of black-capped chickadees (Poecile 
atricapilla) rely on repetitions of elements (Templeton, 
Greene & Davis, 2005). 


(b) Step 2: qualifying syntactic structures 


After establishing cases of syntactic combination, one can 
determine whether a combination is semantically composi- 
tional. As we have seen, human phonology is non-composi- 
tional: the meaning of ‘candid’ is not related to the 
meanings of ‘can’ and ‘did’. On the other hand, sentential 
syntax generally is compositional: the meaning of the sen- 
tence ‘John left’ is derived from the meaning of ‘John’ and 
that of ‘left’. 

Determining whether a combination is semantically com- 
positional can only be done on combinations that are mean- 
ingful, i.e. that possess a semantic denotation as defined in 
Section II. 1.a (stage 3 in Fig. 3). One of the strongest tools 
to identify compositionality is productivity (Baayen, 1992; 
Szabo, 2020). Semantic compositionality implies that a signal 
can be used in different syntactic combinations (e.g. the word 
‘elephant’ can be used in the sequences ‘a big elephant’ and 
‘the elephant’), and productivity 1s what allows one to inter- 
pret the signal in all of these combinations, including entirely 
novel sequences (e.g. “The one-eyed elephant is eating blue 
popcorn’). Productivity thus implies that (a) a signal contrib- 
utes the same meaning in different sequences, and (b) one can 
produce and interpret novel sequences. For example, the 
English suffix *-proof? can be combined with many different 
nouns, always contributing the same meaning (‘bullet-proof , 
‘water-proof’, efc.), and can even be applied to words that 
only recently appeared in the English language (‘Covid- 
proof’). 

Productivity can be identified and quantified in non- 
human systems, in a two-step procedure. The first step 
(4a in Fig. 3) is to verify that a signal can be used with the 
same meaning across different combinations, and identify 
the possible combinations. This can be achieved through nat- 
uralistic observations: the greater the number of combina- 
tions in which a signal appears, the greater the productivity 
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1. Description of the combinations 


a. Description of the signal system 


> 


b. Signals appearing in different combinations are perceived as the same units 
AB is perceived as AB, AB and AB 


Yes 
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No 


2. Description of the combining rules 


Disrupted combinations are different combinations 


Example: ‘order matters’ 
AB is not perceived as BA 


Yes 


Yes No 


Yes No 


Detection of the 
syntactic structure 


Qualification of the 
syntactic structure 


Fig. 3. Detection and qualification of syntactic structures. The analysis steps are illustrated with a fictive system in which two signals A 


and B can be combined into an AB sequence. 


of the system. Campbell’s monkeys have two different alarm 
calls, ‘krak’ and ‘hok’, where ‘hok’ is specific to aerial distur- 
bances (Ouattara, Lemasson & Zuberbiihler, 20092). These 
calls can also be followed by a suffix, *-00’: ‘krak-oo’ indi- 
cates a weak disturbance; ‘hok-oo’ indicates a weak aerial 
disturbance (Ouattara et al., 2009a). The ‘-oo’ suffix is pro- 
ductive because it contributes the same meaning in two dif- 
ferent sequences: in both cases, it attenuates the level of 
danger. However, the small size of the inventories involved 
results in relatively weak productivity; an alternative expla- 
nation could be that the animals have memorized distinct 
meanings for each call sequence (Kuhn ef al., 2018). 

The second step (4b in Fig. 3) is to verify whether non- 
human animals can understand novel sequences using com- 
positional syntax: if so, the degree of productivity is high. 
This can be achieved with an experimental paradigm. For 
example, Japanese great tits produce ABC-calls in response 


to predators, and listeners respond to this call by scanning 
the area. D-calls are recruitment calls, emitted in non- 
dangerous situations to attract the receiver. These calls can 
be combined into a ABC-D sequence that combines the 
two meanings: it is emitted in presence of predators to recruit 
conspecifics for mobbing, and receivers approach the signal- 
ler while scanning the area (Suzuki, Wheatcroft & 
Griesser, 2020). These patterns display at least a weak degree 
of productivity, since the semantic contribution of both 
ABC-calls and D-calls is the same across different sequences 
in which they appear. To investigate a higher degree of pro- 
ductivity, Suzuki, Wheatcroft & Griesser (2017) further arti- 
ficially combined Japanese tits’ ABC-calls with the 
recruitment call (‘tad’) of willow tits (Poecile montanus), which 
is used to attract both conspecifics and heterospecifics, 
including Japanese tits. Japanese tits responded similarly to 
these entirely novel sequences and to natural ABC-D calls 
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(approach while scanning), thus displaying productivity with 
novel examples. This high degree of productivity is strong 
evidence for semantic compositionality. 

In contrast, non-compositionality can be detected when 
the sequence is meaningful but it is impossible to assign a 
semantic value to the component elements in a way that 
derives the meaning of the complex sequence. For example, 
chestnut-crowned babblers (Pomatostomus ruficeps) possess two 
calls composed of the same notes A and B: the flight call 
(AB) is given during flight and the prompt call (BAB) is given 
when provisioning nestlings with food (Engesser ef al., 2015). 
Given the great difference in the contexts of use (flight versus 
feeding nestlings), there is no clear way to derive these two 
denotations from primitive meanings of the A and B calls. 

Another example is the case of the ‘pyow-hack’ sequences 
of putty-nosed monkeys (Arnold & Zuberbiihler, 2012). 
‘Pyow’ calls are used for general disturbances and attract 
the attention of receivers, while ‘hack’ calls indicate eagle 
presence and inhibit movement. These calls can be com- 
bined into a ‘pyow-hack’ sequence, which elicit group move- 
ment in receivers in the absence of predators: the meaning of 
this combination does not seem to be derived from the mean- 
ing of its constituents. Moreover, playback experiments 
showed that listeners responded similarly to sequences of 
varying proportion of ‘pyow’ and ‘hack’ calls, further sug- 
gesting that these combinations are non-compositional [but 
see Schlenker et al. (2016a) for another interpretation, dis- 
cussed further in Section II.5]. 


(0) Assessing the semantic value of the component parts 


As mentioned in the introduction to Section II.1, our seman- 
tic methodology applies primarily to propositional meanings: 
these correspond to features of circumstances that can be 
directly observed, and describe facts that can be evaluated 
as true or false (e.g. “There is a leopard’, “The signaller is 
afraid’). In human language, however, there are meaningful 
signals that do not have this type of meaning — this is the case 
for most words in isolation. For example, one cannot evalu- 
ate the word ‘afraid’ as true or false without knowing which 
individual is being described. The word ‘not’ in isolation also 
cannot be evaluated as true or false. A similar situation may 
hold for the compositional syntax of animal communication: 
even if the meaning of the whole is associated with features of 
circumstances, the meanings of the parts do not necessarily 
have this type of meaning (e.g. a signal in a combination 
could potentially denote negation while the combination as 
a whole denotes the absence of a predator). 

The strategy traditionally adopted by linguists is to establish 
the meaning ofa sentence, and then to work backwards to the 
meaning of the parts. For example, if we know what ‘It’s rain- 
ing’ and ‘It’s not rammg’ mean, then we can deduce the 
meaning of ‘not’. But even for human language, this method- 
ology leaves room for different theoretical possibilities. For 
example, in Italian, the sentence ‘Nessuno ha visto niente’ (lit- 
erally, ‘Nobody has seen nothing’) negates the proposition 
that someone saw something, but it is not immediately obvious 
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which word in the sentence introduces the negative meaning: 
‘nessuno’, ‘niente’, or something else (Giannakidou & 
Zeylstra, 2017). In animals, a similar methodology can be 
adopted, but the same questions need to be asked, and while 
our framework allows the debate to be opened, there is cur- 
rently no one-size-fits-all algorithm to answer them. 


(d) Hierarchical structures 


Hierarchical structures are created by syntactic systems in 
which the output of one combinatorial rule is the input for 
a second combinatorial rule. In English, for example, ‘the’ 
combines with ‘bird’ to give ‘the bird’, which can then com- 
bine with ‘sings’ to give the sentence ‘the bird sings’. This 
sentence can be represented hierarchically as [[the bird] 
sings]. While hierarchy allows the possibility of infinite recur- 
sion (e.g. ex-husband, ex-ex-husband, ex-ex-ex-husband, 
etc.), infinite recursion is not a necessary component of hierar- 
chical structure. 

Several kinds of evidence have been used to argue for struc- 
tural hierarchy in the communication of humans and non- 
human animals. One criterion for hierarchy is the presence 
of dependencies between non-adjacent elements. In English, 
for example, the expression ‘either ... or ...’ shows a long- 
distance dependency: the word ‘either’ must be followed by 
the word ‘or’, but the distance between the two can be arbi- 
trarily large (e.g. “Either you tell me what you told your 
brother last night or Pll scream’). This dependency can never- 
theless be stated by a simple hierarchical rule (‘Either S1 or 
S2’ is well formed) that refers to large chunks of structure. 
Long-distance dependency can be found in canaries (Serinus 
canarius): a syllable type can influence the choice of another syl- 
lable type produced up to five syllables later. For example, ifa 
phrase C precedes the phrase sequence DABN, the following 
phrase is likely to be Y (i.e. sequence CDABNY), while if the 
phrase N precedes DABN, the following phrase is more likely 
to be E (1e. NDABNE) (Markowitz et al., 2013). 

Mierarchical structure can also be motivated by identifying 
constituents, i.e. substrings of a sequence that function as a 
single unit. For example, in the English sentence “The bird 
sang’, the substring ‘the bird’ is a constituent: it behaves as 
a single unit under a variety of manipulations, including per- 
mutations of the sequence (“What sang was the bird’) and 
replacement by other elements (‘It sang’). In chimpanzees, 
the combination of ‘hoo’ and ‘panted hoo’ (bigram 
HO_PH) can be emitted alone but is also found in the larger 
combinations HO_PH_PS or HO_PH_PB (Girard-Buttoz 
et al., 2022), suggesting that the bigram HO_PH 1s a constit- 
uent, which can then combine in a larger syntactic frame. 


I. ANALYSING ANIMAL LINGUISTIC DATA: A 
TOOLBOX 


Animal communication has long been studied with biological 
tools, while human language has always been studied with 
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linguistic methods. This lack of unity in methodology makes 
the two systems difficult to compare. One study recently eval- 
uated human languages with the tools commonly used in ani- 
mal communication. It failed to highlight semantics, 
syntactic structure or vocal learning in human language 
(Prat, 2019). This result is puzzling, and strongly suggests 
that, in order to properly compare animal and human com- 
munication and find linguistic-like capacities in animals, we 
should unify the methods. We describe below a set of 
methods that have been successfully applied to both human 
and non-human communication systems to study semantics, 
syntax and pragmatics. 


(1) Establishing a signal repertoire 


The first step of any animal linguistic investigation is to estab- 
lish a comprehensive repertoire of signals of the species of 
interest. This step involves observations, measurements, 
description of the signals and classification. The relevant 
methodologies have been extensively covered in the etholog- 
ical literature (see online Supporting Information, Table S1). 
In humans, one method to verify that two similar signals are 
the same unit is the contrastive distribution. The typical par- 
adigm consists of replacing one signal by the other, in the 
same environment (e.g. in the same word). If the signals are 
not the same unit (‘contrastive’), this permutation results in 
a change in meaning. For example, English has a contrast 
between /r/ and /1/, which can be highlighted by the fact 
that the words ‘row’ and ‘low’ have different meanings. In 
Japanese, however, this contrast does not exist: [r] and 
[I] are variants of a single liquid consonant, so no contrastive 
pairs can be found. 

In animals, contrastive distribution can be used to establish 
solid signal repertoires based on how animals themselves use 
and perceive signals: if two signals are contrastive, they elicit 
a different reaction by the recipient when presented in the 
same environment. 

Contrastive distribution can be used in experimental 
design to verify that units composing a combination are per- 
ceived as the same unit. Chestnut-crowned babblers seem- 
ingly combine the A and B notes into the flight 
combination (A-B structure) and into the prompt combina- 
tion (B—A-B structure), which have different meanings 
(Engesser ef al., 2015). The authors exposed subjects to natu- 
ral flight and prompt combinations, and artificially rebuilt 
combinations (flight combinations made of prompt combina- 
tion units and vice versa). They showed that subjects reacted 
similarly to natural and artificial combinations, suggesting 
that this species combines the same units into different com- 
binations (see more details in Engesser e¢ al., 2015). However, 
this methodology does not necessarily require an experimen- 
tal design. For example, Hobaiter & Byrne (2017) used an 
observational paradigm to establish the gestural repertoire 
of chimpanzees. They showed that wild chimpanzees may 
swing the arm or swing the leg, but that these two gestures 
are not contrastive: the two appear in the same circumstances 
of use so have exactly the same meaning, suggesting that they 
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can be considered as the same unit. In contrast, hitting with 
the hand versus hitting with the foot have different meanings 
to the chimpanzees, suggesting that these two signals are dis- 
tinct units. 

On the other hand, even when contrastive distributions 
have been established, there may remain analytical choices 
for the theoretician. One such example relates to non- 
compositional syntax, where a syntactically complex signal 
is considered a single unit at the level of semantics. By our 
definition of semantic denotation, the English word ‘cat’ 
has to mean the same thing each time it is used. But the 
‘cat’ in ‘caterpillar’ does not have this meaning. This never- 
theless does not mean that we should revise our definition of 
‘cat’; rather, we should revise what is counted as an occur- 
rence of the signal ‘cat’ to include only cases where it is not 
followed by *...erpillar’. An exactly analogous situation holds 
for the non-compositional syntax of chestnut-crowned bab- 
blers. The signal AB is used as a flight call, but notably, the 
same string is a subpart of the BAB prompt call. To establish 
the meaning of AB as flight-related, one must make the ana- 


lytical decision to not count occurrences of BAB as instances 
of the signal AB. 


(2) Computational linguistics to detect syntactic 
patterns 


Often, when data sets are large or when patterns are com- 
plex, one can experience difficulties identifying combinato- 
rial patterns from observations of the data alone. Several 
tools derived from computational linguistics methods can 
help detection and testing for specific patterns in animal com- 
munication, such as repetitions, combinations, ordering, 
overlapping or temporal structures. These computational 
tools are extensively presented, together with the type of 
sequences and patterns they are best suited for, in Kershen- 
baum eé al. (2014a). Of note, Markov models 
(Kershenbaum ¢ al., 20146; Alger, Larget & Riters, 2016; 
but see Kershenbaum & Garland, 2015), N-grams models 
(e.g. Berthet e al, 2019), transitions probabilities 
(e.g. Jin & Kozhevnikov, 2011) and collocation analyses 
(e.g. Leroux ef al., 2021) can all help highlight which signals 
are more likely to be combined. String edit distance methods 
(e.g. Kershenbaum ed al, 2012; Kershenbaum & 
Garland, 2015) can allow long sequences of signals to be 
compared to detect underlying structure. Hierarchical struc- 
ture can be investigated using a number of different compu- 
tational tools, like entropy estimators (Suzuki, Buck & 
Tyack, 2006), network analyses (Allen et al., 2019), Markov- 
ian processes (Sainburg ef al., 2019) and quantification of 
clustering events (Kello et al., 2017), for example. 


(3) Apparently Satisfactory Outcome to investigate 
semantics 


Investigating the semantics of signals is a difficult task, espe- 
cially for signals that occur in a large diversity of contexts. 
The Apparently Satisfactory Outcome (ASO) is an efficient 
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method to investigate the meaning of signals that are emitted 
intentionally (Hobaiter & Byrne, 2014), in particular the 
semantic denotation of goal-directed signals (see 
Section II.1.a), or pragmatic inferences involving receiver’s 
behaviour (see Section II.1.4). The ASO is defined as the 
action performed by the recipient that results in cessation of 
signalling by the signaller. This method relies on the assump- 
tion that, in intentional communication, an individual will 
continue to emit a signal until the recipient’s reaction is con- 
gruent with the signal’s meaning, 1.e. until the reaction is sat- 
isfactory to the signaller. This ASO is taken to be the 
meaning of the signal as intended by the signaller: the seman- 
tic denotation and pragmatic inferences of the signal in a 
population can be derived from ASOs collected across many 
instances and individuals. This method has been successfully 
applied to the gestures of apes (including humans) (Graham 
et al., 2018; Kersken et al., 2018). 


(4) Modelling meaning with truth/applicability 
conditions 


In humans, semantics is often investigated using the truth 
conditions of sentences. Any native speaker of a language 
knows both whether a sentence sounds natural (syntax) and 
also what the world must be like for the sentence to be true 
(semantics). For example, “The cat sleeps’ is true when there 
is exactly one relevant cat and that cat has the property of 
sleeping. From sentential truth conditions, a linguist can 
work backwards to understand the meaning of individual 
words, by isolating their stable contribution in different sen- 
tences. Writing out explicit statements of truth conditions like 
the following allows sentential truth conditions to be explic- 
itly stated and compared: the sentence S is true exactly when 
conditions C hold. 

It is obviously not possible to ask animals what the mean- 
ing of a signal of theirs is. Also, it remains unknown whether 
any cognitive processes exist in non-human animals that cor- 
respond to the notion of ‘truth’ for sentences of human lan- 
guage. One solution proposed by Schlenker e¢ al. (20160) 1s 
to investigate the conditions under which signals are applica- 
ble and inapplicable using natural observations and experi- 
ments. The meaning of a signal can thus be written as: the 
signal S is applicable exactly when conditions C hold. 

This framework allows researchers to draw precise theo- 
ries about the use and structure of animal’s signals, regardless 
of the cognitive capacities of the species, and derive testable 
hypotheses about the semantics and syntax of signals. ‘This 
method can be applied to any type of animal semantics (see 
Section II.1), including signals whose denotation seems gen- 
eral or unclear (see Dezecache & Berthet, 2018), and has 
been applied to the vocal systems of several primates 
(Schlenker et al., 20165; Berthet et al., 2019a). 


(5) Principles of competition in pragmatics 


As discussed in Section II.1.4, meaning in human and animal 
communication is often enriched by pragmatic processes. 
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One important insight in pragmatics is the principle of com- 
petition between alternatives: inferences are frequently made 
based on what could have been said, but was not 
(Grice, 1957). For example, imagine that Mary and John 
have a dog named Max. John, looking out the window, says 
‘A dog is playing in the garden’. When hearing John’s sen- 
tence, Mary will probably infer that John is not watching 
Max, but another dog that he does not know. In particular, 
if the dog in the garden were Max, the sentence ‘A dog is 
playing in the garden’ would still be true, but John would 
be unlikely to say it, because there is a simpler and more 
informative alternative that he could say instead: ‘Max 1s 
playing in the garden’. Since he did not say this sentence, 
Mary infers that it is not true. In this context, the meaning 
of John’s sentence is pragmatically enriched: ‘A dog is play- 
ing in the garden’ is applicable if there is a dog playing in 
the garden, and it is not Max. The fact that John does not 
know the dog 1s a pragmatic inference, not part of the seman- 
tic denotation of the sentence. 

This example illustrates that, to fully understand a system 
of communication, it 1s crucial to posit a division of labour 
between the semantic denotation of a signal and further 
pragmatic inferences. This difficulty is particularly problem- 
atic for researchers in animal linguistics, who have only 
access to observations and experiments to derive conclusions 
about the semantics of a signal, and can draw limited infer- 
ences about the pragmatic mechanisms at play in other spe- 
cies. To help in this matter, Schlenker e al. (20160) 
postulated three pragmatic principles that can be applied to 
any system to unveil the distinction between pragmatics 
and semantics. 

The informativity principle postulates that when one signal is 
strictly more informative than another, the most informative 
one is used whenever possible. This leads to the assumptions 
that (2) a signaller does not emit a signal S in a situation W ifa 
strictly more informative alternative S’ is applicable in W, 
and (2) a receiver should infer that if S is emitted, every 
strictly more informative alternative S! is non-applicable in 
W. This was argued to be the case in titi monkeys, whose 
A-calls refer to serious threats, while B-calls refer to all note- 
worthy events but are never used for serious threats because 
A-calls are more appropriate (Gommier & Berthet, 2019). 

The urgency principle postulates that urgent information 
(e.g. nature or location of a threat) should be communicated 
as soon as possible in a sequence. As a consequence, signals 
conveying non-urgent information are used later in the 
sequence. This principle has been used to explain the use of 
‘hack’ calls of putty-nosed monkeys, which are related to 
aerial predation situations when used alone or at the start 
of a sequence, but which convey information about non- 
ground movements when following other calls (Schlenker 
et al., 20166,a). 

These principles are completed with assumptions about 
the subject’s world knowledge. World knowledge is crucial 
for the receivers to extract information from the utterances. 
In the example above, Mary is aware that John knows 
Max, which allows her to draw precise inferences about the 
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meaning of the sentence ‘A dog is playing in the garden’. In 
animals, this knowledge can include the ecology of the spe- 
cies, evaluation of the dangerousness of predators, kinship 
and affiliative ties among other individuals, etc. (see 
Section II.1.4). For example, when hearing calls conveying 
information about the presence of a serious threat (A-calls), 
titi monkeys look up, probably because they know that seri- 
ous threats are raptors (Schlenker et al, 2017; Berthet 
et al., 2019a). 

While these principles remain to be experimentally con- 
firmed (but see Narbona Sabaté et al., 2022), they have 
helped shed light on the vocal systems of several species of 
monkeys (Schlenker et al., 2016; Dezecache & 
Berthet, 2018). 


(6) Collecting animal data: where to start? 


Investigating animal linguistics is challenging because it 
requires a good understanding of basic linguistic concepts, 
but also involves collecting data on the behaviour of animals. 
As such, researchers involved in animal linguistics should be 
familiar with the basic methodology for collecting and pro- 
cessing animal behavioural data, and be aware of its common 
pitfalls and difficulties. 

Designing and conducting a study with animals requires 
specific training and knowledge, as well as specific consider- 
ations. These aspects have already been extensively covered 
in the literature, so we will not repeat them here, but we pro- 
vide a table of references (see Table $1) that may be useful to 
researchers that are new to the field. 


IV. CONCLUSIONS 


(1) Animal linguistics is a challenging domain that requires a 
good knowledge of both linguistics and animal cognition. 
The study of the evolution of language will benefit from gen- 
uine inter-disciplinary collaboration. 

(2) One threat is the misapplication of linguistic jargon to ani- 
mal communication systems. Here, we proposed clear defini- 
tions of core concepts in animal linguistics (‘semantics’, 
‘pragmatics’ and ‘syntax’). For each concept, we provide cri- 
teria that need to be fulfilled to draw reliable comparisons 
between human and animal communicative systems. 

(3) Another difficulty arises with the choice of relevant and 
efficient tools to detect linguistic capacities in non-human sys- 
tems. We reviewed several methods that have already been 
successfully applied to non-human signals. We hope and 
expect that additional tools will be developed in future col- 
laborations between linguists and biologists. 

(4) A final difficulty comes with the collection of behavioural 
data on non-human animals. We provide a list of useful ref- 
erences for researchers with little practical knowledge. 

(5) This primer aims at encouraging interdisciplinary collab- 
oration, promoting mutual respect among fields and stimu- 
lating respectful discussion. We hope it will help the nascent 


95 


field of animal linguistics thrive and contribute to exciting 
discoveries on the parallels between animal communication 
systems and reveal the evolutionary history of language and 
other communicative systems. 
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